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Abstract 

Histograms, box plots and cumulative distribution graphs are popular graphic representations for statistical 
distributions. The main research question that this study focuses on is how college students deal with interpretation 
of these statistical graphs when translating graphical representations into analytical concepts in descriptive statistics. 
This study is divided into two parts. The research sample included 256 college students in the first part and 187 
college students in the second part. The research tools were questionnaires dealing with the interpretation of the 
graphs, while relating the graphs and other concepts in descriptive statistics. In spite of the benefits learners may reap 
from using multiple representations, the results reveal that some of the students had difficulties in relating multiple 
representations to the same data. Educators have to take into account that only deep understanding of each and every 
one of the representations and their inter-relation will enable students to translate successfully one format into 
another. Some of the mistakes students made could be derived from the use of the intuitive rule known as the Same 
A-Same B. 
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1. Introduction 

The interpretation of data and graphs are central practices in science (Bowen & Roth, 2005). Due to their importance, 
graphs can be considered the cornerstone of data analysis (Eshach & Kukliansky, 2016). Faced with growing 
abundance of graphic representation in research articles, newspapers and the internet, today's students are expected 
not only to build various graphs, but also to know how to interpret them. Interpretation of graphs is considered an 
essential part of statistical literacy (Gal, 2002; Gal, 2004; Watson, 2006; Aoyama, 2007). 

The current body of knowledge suggests that although students are often able to draw graphs, they frequently 
perform poorly on graph interpretation. The present study focuses on how students are dealing with interpretation of 
statistical graphs such as box plots, histograms and cumulative frequency distribution graphs. 

A link between different representations of the same data is required in the process of the graph’s interpretation. 
Many studies highlight the benefits learners may reap from using multiple representations. To name a few, 
Ainsworth (1999) argued that a known representation may help understand an unknown representation, and that 
representations may complement each other. Petre, Blackwell and Green (1998) explained that mentally moving 
between representations forces learners to look beyond the borders and details of a certain representation; Dori and 
Sasson (2008) found that the ability to move back and forth between representations improved both graphic and 
conceptual abilities of learners. On the other hand, working within a multi-representational learning environment 
may pose a difficult challenge for learners in linking representations and moving flexibly between them (Even, 1998; 
Hong, Thomas and Kwong, 2000; Kukliansky & Eshach, 2014). Van der Meij and de Jong (2006) argued that a 
multiple representation environment requires the following: 1) understanding the syntax of each of the 
representations, 2) understanding which part of the topic is being represented, 3) identifying partial correspondence 
between representations, and 4) translating between representations by finding the similarities and differences in the 
two systems of representation. According to Ainsworth (2008), "multiple representations are powerful tools to help 
learners develop complex scientific knowledge. But like all powerful tools, they require carefully handling and often 
considerable experience before people can use them to their maximum effectiveness." Construction of mental 
representations while analyzing graphical representations includes information selection and information 
organization, parsing of symbol structures, mapping of analog structures as well as model construction and model 
inspection (Schnotz & Bannert, 2003). Different levels of mental representations are needed in the process of the 
graph's interpretation. According to Friel, Curcio and Bright (2001), the three levels of graph sense are: (a) reading 
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the data-questions that are supposedly answered on the graph, (b) reading between the data- interpolating and finding 
relationships in the data presented in a graph and (c) reading beyond the data-extrapolating or inferring from the 
graph in order to solve complicated questions. 

Some of the mistakes made by the students interpreting graphs can be explained by the use of intuitive rules that are 
developed by learners and help them to solve different tasks in math and science. These rules are considered to be 
intuitive (Stavy et ah, 2006) because the students see such explanations as self-evident and sufficient. Based on 
extensive observations, Stavy and Tirosh (2000) found that many alternative conceptions are connected to the use of 
the following intuitive rules: 

Same A-Same B: This intuitive rule is employed when comparing two systems (1,2). a) Each of the systems has two 
features - A and B. It is known that feature A is identical in both systems (A1=A2), thus it is intuited that feature B 
will also be identical (B1=B2). The rule is sometimes correct and sometimes not. 

b) More A-More B: This intuitive rule is employed when comparing two systems (1, 2). Each of the system has two 
features - A and B. It is known that A1>A2, thus it is intuited that the same relation holds also for feature B, meaning 
B1>B2. In cases where the relation is different, using this rule will lead to an error. 

The approach of intuitive rules has recently been supported by brain imaging studies and reaction time research 
(Babai, Brecher, Stavy & Tirosh, 2010). 

As mentioned before, this quantitative study refers to the interpretation of graphs that students learn in descriptive 
statistics. It is divided into two parts. Specifically, in each part we address a single research question: 

Part I: How do college students deal with interpreting box plots and histograms when translating graphical 
representations into analytical concepts in descriptive statistics? 

Part II: How do college students deal with interpreting cumulative frequency distribution graphs while referring to 
the three levels of graph sense? 

2. Methods 

2.1 Part I: the Research Tool and Participants 

The research tool was a questionnaire of nine true/false items dealing with the relation between a) two box plots and 
b) two histograms, and other associated concepts in descriptive statistics. The questions referred to: two box plots 
having the same ranges but different medians and interquartile ranges; two symmetric histograms for data having the 
same average but different standard deviations as shown in Figure 1. 



The questions (referred to in the Results section) for the box plots and for the histograms were presented to the 
subjects in a random order. 

The research sample included 256 college students, 85 of them studying business administration and 171 studying 
economics. The questionnaire was a part of their final exam, so the students’ study habits and motivation were as 
high as one could possibly expect. 

2.2 Part II: the research Tool and Participants 

The research tool for part II included a questionnaire of 8 true-false items, 4 of them examining the (a) level, 2 of 
them examining the (b) level and 2 of them examining the (c) level of graph sense understanding. All of the items 
referred to the same cumulative frequency distribution graph (Figure 2, "less than " type) where the X axis showed 
the number of hours that people spend on the internet and the Y axis the cumulative number of people. 
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Figure 2. The cumulative frequency distribution graph 

As in the previous part, the questionnaire was a part of the final exam. The questions in the questionnaire were 
arranged randomly. 

The participants of the Part II were 187 college students: 98 of them studying business administration and 89 of them 
studying marine sciences. 

3. Results 

3.1 The Results of Part I 

The students were presented with two box plots having the same data range, but different dispersion. The average of 
correct answers for the five box plot items was 83.6%. The students were presented with four items related to the two 
histograms for symmetrical data having the same average but different dispersion. The average of right answers for 
the histogram items was 72.5%. The items and the percentage of right answerers for each of them are shown below 
in Table 1: 

Table 1. The percentages (%) of correct answers for all of the items in the two groups 


The items, (T)-true item, (F)-false item 

Business 

administration 

Econonomics 

1. (T) The interquartile range of data in box plot (1) is higher than in box plot (2). 

71 

74 

2. (T) The median of the data in box plot (2) is higher than in box plot (1) 

80 

86 

3. (F) The distribution of the data in box plot (1) is positively skewed and the 
distribution of the data in box plot (2) is negatively skewed 

80 

77 

4. (F) The range of data in both box plots is identical, so they have the same 
averages 

86 

89 

5. (F) The range of data in both box plots is identical, so they have the same 
standard deviations 

95 

95 

6. (T) The averages of data in both histograms are identical and the standard 
deviation of data in histogram (2) is higher than in histogram (1) 

42 

45 

7. (F) The average of data in histogram (2) is higher than in histogram (1) and the 
standard deviation of data in histogram (1) is higher than in histogram (2) 

75 

81 

8. (T) The median of the data in histogram (2) is higher than in histogram (1) 

90 

91.5 

9. (F) If the averages of data in both histograms are identical, than the standard 
deviations are identical as well 

75 

76.5 
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The average percent of correct answers was 77.1% for the business administration students and 79.4%.for those who 
studied economics. The t-test for related samples revealed significant differences between the mean percentage of 
correct answers between the business administration and economics students (t (8) =2.48, *p<0.05). The most 
difficult question for the students of both groups required the evaluation of the data dispersion from the histogram. 
The hypothesis of guessing (p=50%) was tested and rejected for each of the nine items (*p<0.05). 

3.2 The Results of Part II 

Table 2 presents the percentages (%) of correct answers for all of the items in the two student groups referring to the 
three levels of graph sense: (a) reading the data, (b) reading between the data and (c) reading beyond the data. 


Table 2. The percentages (%) of correct answers for all of the items in the two groups 


The item's level, (T)-true item, (F)-false item 

Bissuness 

administration 

Marine 

studies 

1. (a) The median time of surfing the internet is 6 hours daily (F) 

85 

86 

2. (a) 100 people surf the internet more than 9 hours daily (F) 

79 

85 

3. (a) 30% of the people surf the internet up to 5 hours daily (F) 

87 

90 

4. (a) 130 people surf the internet up to 10 hours daily (T) 

89 

92 

5. (b) 110 people surf the internet between 5 and 12 hours daily (T) 

75 

80 

6. (b) The number of people surfing the internet between 10 and 15 hours is 
smaller than the number of people surfing the internet between 9 to 10 hours 
(T) 

71 

77 

7. (c) The average time of surfing the internet is 7.567 hours daily (T) 

60 

65 

8. (c) The distribution of the time surfing the internet daily is positively 
skewed (F) 

62 

66 


The average percentage for the cumulative frequency distribution items for all the participants was 78%: 80.1% for 
the marine science student and 76% for the business administration students. 


A non-significant Levene's test for equality of variances indicated that the variances of the right answer percentages 
for the three levels were equal (F=1.17, p=0.39, p>0.05). The two-way ANOVA revealed significant differences 
between the mean percentage of right answers for the three levels (F=742.78, ***p<0.001).The post-hoc multiple 
comparisons revealed significant differences between the means of all the pairs (***p<0.001). The marine science 
students performed significantly better than the business administration students (F=70.22, *p<0.05). The most 
difficult were the (c) level questions where the students had to build a frequency distribution table from the graph in 
order to calculate the average or to determine the shape of the distribution. The true-false questions may create a 
justified guessing concern, so the hypothesis of guessing was checked for each of the eight items and rejected 
(*p<0.05). 

4. Discussion 

Graphic representations are essential communication tools (Eshach, 2014).The graphs can help to visualize and 
interpret the variation, patterns, and trends within the data. The present study dealt with the student's ability to 
interpret graphs: box plots, histograms and cumulative frequency distribution graphs. In part I of this study, 
translating graphical representations such as box plots and histograms, into analytical concepts in descriptive 
statistics was examined. In part II interpreting the cumulative frequency distribution graphs was examined. The 
average of correct answers for the whole questionnaire was 78%, meaning that students didn't succeed in about one 
fifth of the questions. In both parts of this study, there were students who had difficulties in translating the graph into 
analytical concepts while linking different representations. It has to be mentioned that not only novice students, but 
also experts have difficulties in reading graphs (e.g. Roth & Bowen, 2001; Glazer, 2011). 

The most difficult item in part I was item 6, where students could interpret the height of the graph as a bigger 
dispersion. They did not understand that if most of the data is close to the average, the standard deviation will be 
smaller. The difficulties students had in items 4, 5, 9 in part I, can be explained through intuitive rules theory (Stavy 
& Tirosh, 2000). It is possible that students used here the intuitive rule Same A-Same B. Responses using the 
intuitive rules are given with great confidence, often persisting despite formal learning. These rules are considered to 
be intuitive (Stavy et ah, 2006) because the students see such explanations as sufficient in addition to self-evident. 
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Furthermore, responses in line with intuitive rules are made quickly (Babai, Brecher, Stavy & Tirosh, 2006). In part 
II, the results revealed that more difficulties were observed in the higher level questions where students had to 
convert one representation into another several times. For example, in item 7 in part II, the first step is to convert the 
graph into a frequency distribution table and the second one is to compute the average, meaning that the visual 
representation is first converted into an analytic representation, and then into a numeric one. A broad range of factors 
influence learning (Ainsworth, 2006), so only deep understanding of each of the representations and the link between 
them can enable students to successfully convert one representation into another. The cognitive challenge students 
face in this process simultaneously representing the same data in a variety of forms, moving from one representation 
to another, may explain the difficulties identified in this study which also concur with other studies conducted 
regarding the handling of multiple representations simultaneously (e.g. Schoenfeld , Smith & Arcavi, 1993; Arcavi, 
2003; Herman, 2007). 

Educators have to take this into account in designing better and more efficient learning environments. For example, 
students’ exposure to multiple representations of the same distribution can be useful for better interpretation of 
statistical graphs (Lem, Onghena, Verschaffel & Van Dooren, 2012). 
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